Working with features. Total of [100 + 10 marks].¶

A) The code below is meant to return the position (using zero based numbering) of the 1020th to 1100th smallest elements of array a. It fails. By reshaping a, or otherwise, accomplish the task without using for loops. Total of [10 marks].

In [38]:
import numpy as np

a = np.array([list(range(1000)) + list(range(2000,1000,-1))])
np.argsort(a)[1020:1100]
Out[38]:
array([], shape=(0, 2000), dtype=int64)
In [39]:
# Answer A) here 
# [10 marks]

a_reshaped = a.reshape(-1)
sorted_indices = np.argsort(a_reshaped)
result_task_a = sorted_indices[1019:1100]

result_task_a
Out[39]:
array([1980, 1979, 1978, 1977, 1976, 1975, 1974, 1973, 1972, 1971, 1970,
       1969, 1968, 1967, 1966, 1965, 1964, 1963, 1962, 1961, 1960, 1959,
       1958, 1957, 1956, 1955, 1954, 1953, 1952, 1951, 1950, 1949, 1948,
       1947, 1946, 1945, 1944, 1943, 1942, 1941, 1940, 1939, 1938, 1937,
       1936, 1935, 1934, 1933, 1932, 1931, 1930, 1929, 1928, 1927, 1926,
       1925, 1924, 1923, 1922, 1921, 1920, 1919, 1918, 1917, 1916, 1915,
       1914, 1913, 1912, 1911, 1910, 1909, 1908, 1907, 1906, 1905, 1904,
       1903, 1902, 1901, 1900])

B) Find the two nearest neigbor of the 5th element in a. Total of [10 marks].

In [48]:
# create array a.
a = np.array([list(range(5000))])
a = np.reshape(a, [1000, 5])

# find nearest neigbors of the fifth element
all_d = []
for i in range(a.shape[0]):
    d = np.linalg.norm(a[i] - a[5])
    all_d.append(d)
np.argsort(all_d)[1:3]

# rewrite the above code to eliminate the for loop
Out[48]:
array([4, 6])
In [49]:
# Answer B) here 
# [10 marks]

import numpy as np

# Create a 2D NumPy array
matrix = np.array([list(range(5000))])
matrix = np.reshape(matrix, [1000, 5])

# Calculate vectorized Euclidean distances using NumPy operations
diff = matrix - matrix[5]
squared_diff = diff ** 2
distance = np.sqrt(np.sum(squared_diff, axis=1))

# Find the indices of the two closest points
sorted_indices = np.argsort(distance)
closest_indices = sorted_indices[1:3]

closest_indices
Out[49]:
array([4, 6])

C) CLIP features make it possible to convert text to an image feature. Write code to convert the given images into CLIP features. Use it to retrieve and display the top 10 images retrieved using key words: "cat", "cat face", and "panda playing". Provide code for retrieval using Euclidean distance and cosine distance. Total of [50 marks].

Dataset: https://drive.google.com/drive/folders/1cq6Wj0KPvuvrjBAQEDQdmltwvzh9uKhY?usp=sharing

CLIP network: https://huggingface.co/ o/

In [19]:
pip install torchvision
Requirement already satisfied: torchvision in /Users/user/opt/anaconda3/lib/python3.9/site-packages (0.16.0)
Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in /Users/user/opt/anaconda3/lib/python3.9/site-packages (from torchvision) (9.2.0)
Requirement already satisfied: torch==2.1.0 in /Users/user/opt/anaconda3/lib/python3.9/site-packages (from torchvision) (2.1.0)
Requirement already satisfied: numpy in /Users/user/opt/anaconda3/lib/python3.9/site-packages (from torchvision) (1.24.3)
Requirement already satisfied: requests in /Users/user/opt/anaconda3/lib/python3.9/site-packages (from torchvision) (2.28.1)
Requirement already satisfied: fsspec in /Users/user/opt/anaconda3/lib/python3.9/site-packages (from torch==2.1.0->torchvision) (2023.9.2)
Requirement already satisfied: networkx in /Users/user/opt/anaconda3/lib/python3.9/site-packages (from torch==2.1.0->torchvision) (2.8.4)
Requirement already satisfied: filelock in /Users/user/opt/anaconda3/lib/python3.9/site-packages (from torch==2.1.0->torchvision) (3.6.0)
Requirement already satisfied: sympy in /Users/user/opt/anaconda3/lib/python3.9/site-packages (from torch==2.1.0->torchvision) (1.10.1)
Requirement already satisfied: jinja2 in /Users/user/opt/anaconda3/lib/python3.9/site-packages (from torch==2.1.0->torchvision) (2.11.3)
Requirement already satisfied: typing-extensions in /Users/user/opt/anaconda3/lib/python3.9/site-packages (from torch==2.1.0->torchvision) (4.8.0)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /Users/user/opt/anaconda3/lib/python3.9/site-packages (from requests->torchvision) (1.26.11)
Requirement already satisfied: idna<4,>=2.5 in /Users/user/opt/anaconda3/lib/python3.9/site-packages (from requests->torchvision) (3.3)
Requirement already satisfied: certifi>=2017.4.17 in /Users/user/opt/anaconda3/lib/python3.9/site-packages (from requests->torchvision) (2022.12.7)
Requirement already satisfied: charset-normalizer<3,>=2 in /Users/user/opt/anaconda3/lib/python3.9/site-packages (from requests->torchvision) (2.0.4)
Requirement already satisfied: MarkupSafe>=0.23 in /Users/user/opt/anaconda3/lib/python3.9/site-packages (from jinja2->torch==2.1.0->torchvision) (2.0.1)
Requirement already satisfied: mpmath>=0.19 in /Users/user/opt/anaconda3/lib/python3.9/site-packages (from sympy->torch==2.1.0->torchvision) (1.2.1)
Note: you may need to restart the kernel to use updated packages.
In [20]:
pip install transformers
Requirement already satisfied: transformers in /Users/user/opt/anaconda3/lib/python3.9/site-packages (4.34.0)
Requirement already satisfied: safetensors>=0.3.1 in /Users/user/opt/anaconda3/lib/python3.9/site-packages (from transformers) (0.4.0)
Requirement already satisfied: requests in /Users/user/opt/anaconda3/lib/python3.9/site-packages (from transformers) (2.28.1)
Requirement already satisfied: huggingface-hub<1.0,>=0.16.4 in /Users/user/opt/anaconda3/lib/python3.9/site-packages (from transformers) (0.17.3)
Requirement already satisfied: tokenizers<0.15,>=0.14 in /Users/user/opt/anaconda3/lib/python3.9/site-packages (from transformers) (0.14.1)
Requirement already satisfied: regex!=2019.12.17 in /Users/user/opt/anaconda3/lib/python3.9/site-packages (from transformers) (2022.7.9)
Requirement already satisfied: packaging>=20.0 in /Users/user/opt/anaconda3/lib/python3.9/site-packages (from transformers) (21.3)
Requirement already satisfied: tqdm>=4.27 in /Users/user/opt/anaconda3/lib/python3.9/site-packages (from transformers) (4.64.1)
Requirement already satisfied: pyyaml>=5.1 in /Users/user/opt/anaconda3/lib/python3.9/site-packages (from transformers) (6.0)
Requirement already satisfied: filelock in /Users/user/opt/anaconda3/lib/python3.9/site-packages (from transformers) (3.6.0)
Requirement already satisfied: numpy>=1.17 in /Users/user/opt/anaconda3/lib/python3.9/site-packages (from transformers) (1.24.3)
Requirement already satisfied: fsspec in /Users/user/opt/anaconda3/lib/python3.9/site-packages (from huggingface-hub<1.0,>=0.16.4->transformers) (2023.9.2)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /Users/user/opt/anaconda3/lib/python3.9/site-packages (from huggingface-hub<1.0,>=0.16.4->transformers) (4.8.0)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /Users/user/opt/anaconda3/lib/python3.9/site-packages (from packaging>=20.0->transformers) (3.0.9)
Requirement already satisfied: idna<4,>=2.5 in /Users/user/opt/anaconda3/lib/python3.9/site-packages (from requests->transformers) (3.3)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /Users/user/opt/anaconda3/lib/python3.9/site-packages (from requests->transformers) (1.26.11)
Requirement already satisfied: certifi>=2017.4.17 in /Users/user/opt/anaconda3/lib/python3.9/site-packages (from requests->transformers) (2022.12.7)
Requirement already satisfied: charset-normalizer<3,>=2 in /Users/user/opt/anaconda3/lib/python3.9/site-packages (from requests->transformers) (2.0.4)
Note: you may need to restart the kernel to use updated packages.
In [21]:
# Write code to convert images to CLIP features.
# Save the feature array and ground truth labels to harddisk.
# [10 marks].

import os
from PIL import Image
from torchvision import transforms
import torch
from transformers import CLIPProcessor, CLIPModel
import numpy as np

# Initialize the CLIP model and processor
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")

# Load images from specified directory 'root_folder' into memory
root_folder = "/Users/user/klementgoh/Intro_to_AI/Assignment1B/animal_images"
categories = ["cats", "dogs", "panda"]
image_paths = []
labels = []

for category in categories:
    folder_path = os.path.join(root_folder, category)
    for filename in os.listdir(folder_path):
        if filename.endswith(('.jpg', '.png')):
            image_paths.append(os.path.join(folder_path, filename))
            labels.append(category)

images = [Image.open(path).convert("RGB") for path in image_paths]

image_inputs = processor(
    text=[f"A photograph of a {label}" for label in labels], 
    images=images,
    return_tensors="pt", 
    padding=True, 
    truncation=True
)
with torch.no_grad():
    outputs = model(**image_inputs)
image_features = outputs.image_embeds.detach().numpy()

# Save the image features and labels to disk
np.save('image_features.npy', image_features)
np.save('image_labels.npy', labels)
2023-10-30 16:39:32.034386: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
In [23]:
# Write code to convert images to CLIP features.
# Load the feature array and ground truth labels from harddisk.
# [10 marks].

loaded_image_features = np.load('image_features.npy')
loaded_image_labels = np.load('image_labels.npy')
In [24]:
# Write code to retrieve "cat" here. 
# Display the first 10 retreived images 
# [10 marks].

# Hint: First use CLIP model to convert the text into a feature; then perform nearest neigbor retrevial
from scipy.spatial.distance import euclidean
import matplotlib.pyplot as plt

inputs_text = processor(text=["cat"], return_tensors="pt", padding=True, truncation=True)
with torch.no_grad():
    text_features = model.get_text_features(**inputs_text)
text_features = text_features.squeeze().detach().numpy() 

# Compute Euclidean distances for retrieval
euclidean_dists = [euclidean(text_features, img_feature) for img_feature in loaded_image_features]
top_10_euclidean_indices = np.argsort(euclidean_dists)[:10] 
top_10_images = [Image.open(image_paths[i]) for i in top_10_euclidean_indices]


fig, axes = plt.subplots(4, 3, figsize=(9, 12))
axes = axes.ravel()
for i in range(10):
    image = top_10_images[i]
    axes[i].imshow(np.asarray(image))
    axes[i].set_title(loaded_image_labels[top_10_euclidean_indices[i]])
    axes[i].axis('off')

# Turn off the axes for the remaining two positions
for i in range(10, 12):
    axes.ravel()[i].axis('off')

plt.tight_layout()
plt.show()
In [25]:
# Cosine Similarity
from scipy.spatial.distance import cosine

dummy_images = torch.zeros(1, 3, 224, 224) 

inputs_text = processor(text=["cat"], return_tensors="pt", padding=True, truncation=True)
with torch.no_grad():
    outputs = model(input_ids=inputs_text['input_ids'], attention_mask=inputs_text['attention_mask'], pixel_values=dummy_images)
text_features = outputs.text_embeds.squeeze().detach().numpy() 

# Compute Cosine distances for retrieval
cosine_dists = [cosine(text_features, img_feature) for img_feature in loaded_image_features]
top_10_cosine_indices = np.argsort(cosine_dists)[:10]
top_10_images = [Image.open(image_paths[i]) for i in top_10_cosine_indices]

fig, axes = plt.subplots(4, 3, figsize=(9, 12))
axes = axes.ravel()
for i in range(10):
    image = top_10_images[i]
    axes[i].imshow(np.asarray(image))
    axes[i].set_title(loaded_image_labels[top_10_cosine_indices[i]])
    axes[i].axis('off')

# Turn off the axes for the remaining two positions
for i in range(10, 12):
    axes.ravel()[i].axis('off')

plt.tight_layout()
plt.show()
In [27]:
# write code to retrieve "cat face" here. 
# Display the first 10 retreived images.
# [10 marks].

loaded_image_features = np.load('image_features.npy')
loaded_image_labels = np.load('image_labels.npy')

inputs_text = processor(text=["cat face"], return_tensors="pt", padding=True, truncation=True)
with torch.no_grad():
    text_features = model.get_text_features(**inputs_text)
text_features = text_features.squeeze().detach().numpy()  

# Compute Euclidean distances for retrieval
euclidean_dists = [euclidean(text_features, img_feature) for img_feature in loaded_image_features]
top_10_euclidean_indices = np.argsort(euclidean_dists)[:10]  
top_10_images = [Image.open(image_paths[i]) for i in top_10_euclidean_indices]

fig, axes = plt.subplots(4, 3, figsize=(9, 12))
axes = axes.ravel()
for i in range(10):
    image = top_10_images[i]
    axes[i].imshow(np.asarray(image))
    axes[i].set_title(loaded_image_labels[top_10_euclidean_indices[i]])
    axes[i].axis('off')
    
# Turn off the axes for the remaining two positions
for i in range(10, 12):
    axes.ravel()[i].axis('off')

plt.tight_layout()
plt.show()
In [28]:
# Cosine Similarity -> just follow the same code as "cat"
from scipy.spatial.distance import cosine

dummy_images = torch.zeros(1, 3, 224, 224)

inputs_text = processor(text=["cat face"], return_tensors="pt", padding=True, truncation=True)
with torch.no_grad():
    outputs = model(input_ids=inputs_text['input_ids'], attention_mask=inputs_text['attention_mask'], pixel_values=dummy_images)
text_features = outputs.text_embeds.squeeze().detach().numpy()

# Compute Cosine distances for retrieval
cosine_dists = [cosine(text_features, img_feature) for img_feature in loaded_image_features]
top_10_cosine_indices = np.argsort(cosine_dists)[:10]
top_10_images = [Image.open(image_paths[i]) for i in top_10_cosine_indices]

fig, axes = plt.subplots(4, 3, figsize=(9, 12))
axes = axes.ravel()
for i in range(10):
    image = top_10_images[i]
    axes[i].imshow(np.asarray(image))
    axes[i].set_title(loaded_image_labels[top_10_cosine_indices[i]])
    axes[i].axis('off')
    
# Turn off the axes for the remaining two positions
for i in range(10, 12):
    axes.ravel()[i].axis('off')

plt.tight_layout()
plt.show()
In [29]:
# write code to retrieve "panda playing" here. 
# Display the first 10 retreived images 
# [10 marks].

loaded_image_features = np.load('image_features.npy')
loaded_image_labels = np.load('image_labels.npy')

inputs_text = processor(text=["panda playing"], return_tensors="pt", padding=True, truncation=True)
with torch.no_grad():
    text_features = model.get_text_features(**inputs_text)
text_features = text_features.squeeze().detach().numpy()

# Compute Euclidean distances for retrieval
euclidean_dists = [euclidean(text_features, img_feature) for img_feature in loaded_image_features]
top_10_euclidean_indices = np.argsort(euclidean_dists)[:10]
top_10_images = [Image.open(image_paths[i]) for i in top_10_euclidean_indices]


fig, axes = plt.subplots(4, 3, figsize=(9, 12))
axes = axes.ravel()
for i in range(10):
    image = top_10_images[i]
    axes[i].imshow(np.asarray(image))
    axes[i].set_title(loaded_image_labels[top_10_euclidean_indices[i]])
    axes[i].axis('off')
    
# Turn off the axes for the remaining two positions
for i in range(10, 12):
    axes.ravel()[i].axis('off')

plt.tight_layout()
plt.show()
In [30]:
# Cosine Similarity -> just follow the same code as "cat"
from scipy.spatial.distance import cosine

dummy_images = torch.zeros(1, 3, 224, 224)

inputs_text = processor(text=["panda playing"], return_tensors="pt", padding=True, truncation=True)
with torch.no_grad():
    outputs = model(input_ids=inputs_text['input_ids'], attention_mask=inputs_text['attention_mask'], pixel_values=dummy_images)
text_features = outputs.text_embeds.squeeze().detach().numpy()

# Compute Cosine distances for retrieval
cosine_dists = [cosine(text_features, img_feature) for img_feature in loaded_image_features]
top_10_cosine_indices = np.argsort(cosine_dists)[:10]
top_10_images = [Image.open(image_paths[i]) for i in top_10_cosine_indices]

fig, axes = plt.subplots(4, 3, figsize=(9, 12))
axes = axes.ravel()
for i in range(10):
    image = top_10_images[i]
    axes[i].imshow(np.asarray(image))
    axes[i].set_title(loaded_image_labels[top_10_cosine_indices[i]])
    axes[i].axis('off')

# Turn off the axes for the remaining two positions
for i in range(10, 12):
    axes.ravel()[i].axis('off')
    
plt.tight_layout()
plt.show()

D) Divide the above features into training and testing sets. The training set should be composed of 80\% of the features. The testing set should be composed of 20% of the features. Use the training set to learn a kernel SVM; and report the accuracy on the testing set. Total of [30 marks].

In [58]:
# Write code to divide features into training and testing sets 
# [10 marks].

from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score
import numpy as np
from sklearn.preprocessing import StandardScaler

features = np.load('image_features.npy')
labels = np.load('image_labels.npy')

X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.2, random_state=42)
In [59]:
# Write code to train the kernel SVM 
# [10 marks]. 

clf = svm.SVC(kernel='rbf', gamma='scale')
clf.fit(X_train, y_train)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
In [60]:
# Write code to evalute kernel SVM.
# Accuracy should be above 99%. 
# [10 marks].

X_test = scaler.transform(X_test)

clf = svm.SVC(kernel='rbf', gamma='scale')

clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy * 100:.2f}%')
Accuracy: 99.50%

Bonus: Use one-class svm to learn a manifold that encapsulates the features of the cat class. Show that the learned manifold will separate the test cat features from other features. Evaluation should be in terms of AUROC. The solution requires substantial reading beyond the class syllabus. [10 marks].

In [61]:
# write code for training and testing one-class SVM 
# auroc should be above 98%
# [10 marks]

from sklearn import svm
from sklearn.metrics import roc_auc_score, roc_curve
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import numpy as np

binary_labels = np.array([1 if label == 'cats' else 0 for label in loaded_image_labels])

X_train = loaded_image_features[binary_labels == 1]
X_test = loaded_image_features
y_test = binary_labels

clf = svm.OneClassSVM(nu=0.1, kernel="rbf", gamma=0.1)
clf.fit(X_train)
svm_scores = clf.decision_function(X_test)

# Calculate with AUROC
auroc = roc_auc_score(y_test, svm_scores)
print(f'AUROC score: {auroc}')
AUROC score: 0.988131
In [ ]:
# Since the AUROC score is above 98%, it indicates that your One-Class SVM has learned a
# boundary that effectively separates the test cat features from other features. A high 
# AUROC score suggests strong model performance in distinguishing between the two classes.